Search results for "Instruction set"

showing 10 items of 16 documents

First Results of Hyperspectral Scene Generation in Preparation of the Chime Imaging Spectrometer Mission

2021

End-To-End mission performance simulators (E2Es) are software tools developed to support satellite mission preparatory activities. For passive remote sensing missions, E2Es generate synthetic scenes simulating the interaction of the solar radiation between the atmosphere and the surface; therefore allowing the estimation of the mission performance before its launch. In this paper, we present the CHIME Scene Generator Module (SGM) as part of CHIME E2Es, with state-of-the-art parallelization and optimization that give a performance allowing to obtain a whole year of daily worldwide Top-Of-Atmosphere radiance images in a matter of hours. The CHIME SGM generates 100x200km hyperspectral scenes w…

010504 meteorology & atmospheric sciencesComputer sciencebusiness.industryReal-time computing0211 other engineering and technologiesImaging spectrometerHyperspectral imaging02 engineering and technology01 natural sciencesConvolutionInstruction setSoftwareShadowRadianceSatellitebusiness021101 geological & geomatics engineering0105 earth and related environmental sciences2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS

researchProduct

S-Aligner: Ultrascalable Read Mapping on Sunway Taihu Light

2017

The availability and amount of sequenced genomes have been rapidly growing in recent years because of the adoption of next-generation sequencing (NGS) technologies that enable high-throughput short-read generation at highly competitive cost. Since this trend is expected to continue in the foreseeable future, the design and implementation of efficient and scalable NGS bioinformatics algorithms are important to research and industrial applications. In this paper, we introduce S-Aligner–a highly scalable read mapper designed for the Sunway Taihu Light supercomputer and its fourth-generationShenWei many-core architecture (SW26010). S-Aligner employs a combination of optimization techniques to o…

0301 basic medicineInstruction set03 medical and health sciences030104 developmental biologyXeonAsynchronous communicationComputer scienceMultithreadingScalabilitySIMDParallel computingSW26010Supercomputer2017 IEEE International Conference on Cluster Computing (CLUSTER)

researchProduct

Architectural improvements and FPGA implementation of a multimodel neuroprocessor

2003

Since neural networks (NNs) require an enormous amount of learning time, various kinds of dedicated parallel computers have been developed. In the paper a 2-D systolic array (SA) of dedicated processing elements (PEs) also called systolic cells (SCs) is presented as the heart of a multimodel neural-network accelerator. The instruction set of the SA allows the implementation of several neural algorithms, including error back propagation and a self organizing feature map algorithm. Several special architectural facilities are presented in the paper in order to improve the 2-D SA performance. A swapping mechanism of the weight matrix allows the implementation of NNs larger than 2-D SA. A systo…

Instruction setArtificial neural networkComputer architectureComputer scienceFeature (machine learning)Systolic arrayParallel computingDifference-map algorithmField-programmable gate arrayBackpropagationWord (computer architecture)Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02.

researchProduct

LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs

2015

Compressed sparse row (CSR) is a frequently used format for sparse matrix storage. However, the state-of-the-art CSR-based sparse matrix-vector multiplication (SpMV) implementations on CUDA-enabled GPUs do not exhibit very high efficiency. This has motivated the development of some alternative storage formats for GPU computing. Unfortunately, these alternatives are incompatible with most CPU-centric programs and require dynamic conversion from CSR at runtime, thus incurring significant computational and storage overheads. We present LightSpMV, a novel CUDA-compatible SpMV algorithm using the standard CSR format, which achieves high speed by benefiting from the fine-grained dynamic distribut…

Instruction setCUDASpeedupComputer scienceSparse matrix-vector multiplicationDouble-precision floating-point formatParallel computingGeneral-purpose computing on graphics processing unitsRowSparse matrix2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

researchProduct

Bit-Parallel Approximate Pattern Matching on the Xeon Phi Coprocessor

2014

Bit-parallel pattern matching encodes calculated values in bit arrays. This approach gains its efficiency by performing multiple updates within a machine word. An important parameter is therefore the machine word size (e.g. 32 or 64 bits). With the increasing length of vector registers, the efficient mapping of bit-parallel pattern matching algorithms onto modern high performance computing architectures is becoming increasingly important. In this paper, we investigate an efficient implementation of the Wu-Manber approximate pattern matching algorithm on the Intel Xeon Phi coprocessor. This architecture features a 512-bit long vector processing unit (VPU) as well as a large number of process…

Instruction setCoprocessorSpeedupComputer scienceParallel computingPattern matchingIntrinsicsWord (computer architecture)Xeon PhiVector processor2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing

researchProduct

Design methods of multithreaded architectures for multicore microcontrollers

2011

The development of electronic technology today has allowed the implementation of complex architectures, which led to the emergence of multicore processors technology. Multicore architectures are built from superscalar and multithreaded processors. Integrating new technologies in embedded applications requires the development of multicore processors that can be integrated into a smaller area like a classic microcontroller. These processors must manage fewer resources and be able to manage multiple tasks simultaneously. In this paper we present a method of modeling, simulation and evaluation of two multithreaded architectures with limited resources, which could be integrated into embedded sys…

Instruction setMicrocontrollerMulti-core processorComputer architectureComputer scienceMultithreadingContext (language use)ElectronicsComputer multitaskingComputerSystemsOrganization_PROCESSORARCHITECTURESTemporal multithreading2011 6th IEEE International Symposium on Applied Computational Intelligence and Informatics (SACI)

researchProduct

Spectral evolution simulation on leading multi-socket, multicore platforms

2011

Spectral evolution simulations based on the observed Very Long Baseline Interferometry (VLBI) radio-maps are of paramount importance to understand the nature of extragalactic objects in astrophysics. This work analyzes the performance and scaling of a spectral evolution algorithm on three leading multi-socket, multi-core architectures. We evaluate three parallel models with different levels of data-sharing: a sharing approach, a privatizing approach and a hybrid approach. Our experiments show that the data-privatizing model is reasonably efficient on medium scale multi-socket, multi-core systems (up to 48 cores) while regardless algorithmic and scheduling optimizations, sharing approach is …

Instruction setMulti-core processorSpectral evolutionComputer scienceDistributed computingScalabilityVery-long-baseline interferometryScalingScheduling (computing)2011 18th International Conference on High Performance Computing

researchProduct

SWAPHI-LS: Smith-Waterman Algorithm on Xeon Phi coprocessors for Long DNA Sequences

2014

As an optimal method for sequence alignment, the Smith-Waterman (SW) algorithm is widely used. Unfortunately, this algorithm is computationally demanding, especially for long sequences. This has motivated the investigation of its acceleration on a variety of high-performance computing platforms. However, most work in the literature is only suitable for short sequences. In this paper, we present SWAPHI-LS, the first parallel SW algorithm exploiting emerging Xeon Phi coprocessors to accelerate the alignment of long DNA sequences. In SWAPHI-LS, we have investigated three parallelization approaches (naive, tiled, and distributed) in order to deeply explore the inherent parallelism within Xeon P…

Instruction setSmith–Waterman algorithmCoprocessorXeonComputer scienceData parallelismTask parallelismParallel computingSIMDIntrinsicsInstruction-level parallelismXeon Phi2014 IEEE International Conference on Cluster Computing (CLUSTER)

researchProduct

Multiple modular very long instruction word processors based on field programmable gate arrays

2007

Modern field programmable gate array (FPGA) chips, with their large memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high-density FPGAs, it is now possible to implement a high-performance very long instruction word (VLIW) processor core in an FPGA. This paper describes research results about enabling the DSP TMS320 C6201 model for real-time image processing applications by exploiting FPGA technology. We present a modular DSP C6201 VHDL model with a variable instruction set. We call this new development a minimum mandatory modules (M3) approach. Our goals are to keep the flexibility of DSP in order to shor…

Multi-core processorComputer sciencebusiness.industryReconfigurabilityModular designAtomic and Molecular Physics and OpticsComputer Science ApplicationsInstruction setParallel processing (DSP implementation)Computer architectureVery long instruction wordEmbedded systemVHDLHardware_ARITHMETICANDLOGICSTRUCTURESElectrical and Electronic EngineeringField-programmable gate arraybusinesscomputercomputer.programming_languageJournal of Electronic Imaging

researchProduct

Flexible VLIW processor based on FPGA for real-time image processing

2011

Modern FPGA chips, with their larger memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high density FPGAs it is now possible to implement a high performance Very Long Instruction Word (VLIW) processor core in an FPGA. With VLIW architecture, the processor effectiveness depends on the ability of compilers to provide sufficient Instruction Level Parallelism (ILP) from program code. This paper describes research result about enabling the VLIW processor model for real-time processing applications by exploiting FPGA technology. Our goals are to keep the flexibility of processors in order to shorten the developm…

Multi-core processorbusiness.industryComputer scienceApplication-specific instruction-set processorReconfigurabilityInstruction setComputer architectureVery long instruction wordEmbedded systemVHDLbusinessInstruction-level parallelismcomputercomputer.programming_languageFPGA prototypeProceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)

researchProduct